Optimal Canonization of All Substrings of a String

نویسندگان

  • Alberto Apostolico
  • Maxime Crochemore
چکیده

Any word can be decomposed uniquely into lexicographically nonincreasing factors each one of which is a Lyndon word. This paper addresses the relationship between the Lyndon decomposition of a word x and a canonical rotation of x, i.e., a rotation w ofx that is lexicographically smallest among all rotations ofx. The main combinatorial result is a characterization of the Lyndon factor of x with which w must stan. As an application, faster on-line algorithms for finding the canonical rotation(s) of x are developed by nontrivial extension of known Lyndon factorization strategies. Unlike their predecessors, the new algorithms lend themselves to incremental variants that compute, in linear time, the canonical rotations of all prefixes of x. The fastest such variant represents the main algorithmic contribution of the paper. It performs within the same 3lxl character-comparisons bound as that of the fastest previous on-line algorithms for the canonization of a single string. This leads to the canonization of all substrings of a string in optimal quadratic time, within less than 31x12 character comparisons and using linear auxiliary space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum Unique Substrings and Maximum Repeats

Unique substrings appear scattered in the stringology literature and have important applications in bioinformatics. In this paper we initiate a study of minimum unique substrings in a given string; that is, substrings that occur exactly once while all their substrings are repeats. We discover a strong duality between minimum unique substrings and maximum repeats which, in particular, allows fas...

متن کامل

Eecient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm

A key approach in string processing algorithmics has been the labeling paradigm KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the rst optimal parallel algorithm for suux tree construction was given in SV94], the labeling paradigm was considered not to be compe...

متن کامل

Weak Repetitions in Strings

A weak repetition in a string consists of two or more adjacent substrings which are permutations of each other. We describe a straightforward (n 2) algorithm which computes all the weak repetitions in a given string of length n deened on an arbitrary alphabet A. Using results on Fibonacci and other simple strings, we prove that this algorithm is asymptotically optimal over all known encodings o...

متن کامل

The Efficient Computation of Complete and Concise Substring Scales with Suffix Trees

Strings are an important part of most real application multivalued contexts. Their conceptual treatment requires the definition of substring scales, i.e., sets of relevant substrings, so as to form informative concepts. However these scales are either defined by hand, or derived in a context-unaware manner (e.g., all words occuring in string values). We present an efficient algorithm based on s...

متن کامل

A New Family of String Classifiers Based on Local Relatedness

This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr’s), longest common subsequences (LCSeq’s), and window-accumulated longest common subsequences (wLCSeq’s). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Comput.

دوره 95  شماره 

صفحات  -

تاریخ انتشار 1991